Crate datafusion_physical_plan
source ·Expand description
Traits for physical query plan, supporting parallel execution for partitioned relations.
Re-exports§
pub use crate::display::DefaultDisplay;
pub use crate::display::DisplayAs;
pub use crate::display::DisplayFormatType;
pub use crate::display::VerboseDisplay;
pub use crate::metrics::Metric;
pub use crate::stream::EmptyRecordBatchStream;
Modules§
- Aggregates functionalities
- Defines the ANALYZE operator
- CoalesceBatchesExec combines small batches into larger batches for more efficient use of vectorized processing by upstream operators.
- Defines the merge plan for executing partitions in parallel and then merging the results into a single partition
- Defines common code used in execution plans
- Implementation of physical plan display. See
crate::displayable
for examples of how to format - EmptyRelation with produce_one_row=false execution plan
- Defines the EXPLAIN operator
- Defines physical expressions that can evaluated at runtime during query execution
- FilterExec evaluates a boolean predicate against all input batches to determine which rows to include in its output batches.
- Declaration of built-in (scalar) functions. This module contains built-in functions’ enumeration and metadata.
- Functionality used both on logical and physical plans
- Execution plan for writing data to
DataSink
s - DataFusion Join implementations
- Defines the LIMIT plan
- Execution plan for reading in-memory batches of data
- Metrics for recording information about execution
- EmptyRelation produce_one_row=true execution plan
- Defines the projection execution plan. A projection determines which columns or expressions are returned from a query. The SQL statement
SELECT a, b, a+b FROM t1
is an example of a projection on tablet1
where the expressionsa
,b
, anda+b
are the projection expressions.SELECT
withoutFROM
will only evaluate expressions. - Defines the recursive query plan
- This file implements the
RepartitionExec
operator, which maps N input partitions to M output partitions based on a partitioning scheme, optionally maintaining the order of the input rows in the output. - Sort functionalities
- Stream wrappers for physical operators
- Generic plans for deferred execution:
StreamingTableExec
andPartitionStream
- Utilities for testing datafusion-physical-plan
- This module provides common traits for visiting or rewriting tree nodes easily.
- This module contains functions and structs supporting user-defined aggregate functions.
- UDF support
- The Union operator combines multiple inputs with the same schema
- Defines the unnest column plan for unnesting values in a column that contains a list type, conceptually is like joining each row with all the values in the list column.
- Values execution plan
- Physical expressions for window functions
- Defines the work table query plan
Macros§
- The
handle_async_state
macro adapts thehandle_state
macro for use in asynchronous operations, particularly when dealing withPoll
results within async traits likeEagerJoinStream
. It polls the asynchronous state-changing function usingpoll_unpin
and then passes the result tohandle_state
for further processing. - The
handle_state
macro is designed to process the result of a state-changing operation, encountered e.g. in implementations ofEagerJoinStream
. It operates on aStatefulStreamResult
by matching its variants and executing corresponding actions. This macro is used to streamline code that deals with state transitions, reducing boilerplate and improving readability. - Macro wraps Err(
$ERR
) to add backtrace feature
Structs§
- Statistics for a column within a relation
- Stores certain, often expensive to compute, plan properties used in query optimization.
- Statistics for a relation Fields are optional and can be inexact because the sources sometimes provide approximate estimates for performance reasons and the transformations output are not always predictable.
- Global TopK
Enums§
- Represents the result of evaluating an expression: either a single
ScalarValue
or anArrayRef
. - How data is distributed amongst partitions. See
Partitioning
for more details. - Describes the execution mode of an operator’s resulting stream with respect to its size and behavior. There are three possible execution modes:
Bounded
,Unbounded
andPipelineBreaking
. - Specifies how the input to an aggregation or window operator is ordered relative to their
GROUP BY
orPARTITION BY
expressions. - Output partitioning supported by
ExecutionPlan
s.
Traits§
- Tracks an aggregate function’s state.
- An aggregate expression that:
- Represent nodes in the DataFusion Physical Plan.
- Extension trait provides an easy API to fetch various properties of
ExecutionPlan
objects based onExecutionPlan::properties
. - Trait that implements the Visitor pattern for a depth first walk of
ExecutionPlan
nodes.pre_visit
is called before any children are visited, and thenpost_visit
is called after all children have been visited. PhysicalExpr
evaluate DataFusion expressions such asA + 1
, orCAST(c1 AS int)
.- Trait for types that stream arrow::record_batch::RecordBatch
- Common trait for window function implementations
Functions§
- Visit all children of this plan, according to the order defined on
ExecutionPlanVisitor
. - Execute the ExecutionPlan and collect the results in memory
- Execute the ExecutionPlan and collect the results in memory
- Return a wrapper around an
ExecutionPlan
which can be displayed in various easier to understand ways. - Execute the ExecutionPlan and return a single stream of
RecordBatch
es. - Execute the ExecutionPlan and return a vec with one stream per output partition
- Utility function yielding a string representation of the given
ExecutionPlan
. - Indicate whether a data exchange is needed for the input of
plan
, which will be very helpful especially for the distributed engine to judge whether need to deal with shuffling. Currently there are 3 kinds of execution plan which needs data exchange 1. RepartitionExec for changing the partition number between twoExecutionPlan
s 2. CoalescePartitionsExec for collapsing all of the partitions into one without ordering guarantee 3. SortPreservingMergeExec for collapsing all of the sorted partitions into one with ordering guarantee - Applies an optional projection to a
SchemaRef
, returning the projected schema - Recursively calls
pre_visit
andpost_visit
for this node and all of its children, as described onExecutionPlanVisitor
- Returns a copy of this plan if we change any child according to the pointer comparison. The size of
children
must be equal to the size ofExecutionPlan::children()
.
Type Aliases§
- Trait for a
Stream
ofRecordBatch
es